Tuning MPI Collectives by Verifying Performance Guidelines
نویسندگان
چکیده
ABSTRACT MPI collective operations provide a standardized interface for performing data movements within a group of processes. The e ciency of collective communication operations depends on the actual algorithm, its implementation, and the speci c communication problem (type of communication, message size, number of processes). Many MPI libraries provide numerous algorithms for speci c collective operations. The strategy for selecting an e cient algorithm is often times prede ned (hard-coded) in MPI libraries, but some of them, such as Open MPI, allow users to change the algorithm manually. Finding the best algorithm for each case is a hard problem, and several approaches to tune these algorithmic parameters have been proposed. We use an orthogonal approach to the parameter-tuning of MPI collectives, that is, instead of testing individual algorithmic choices provided by an MPI library, we compare the latency of a speci c MPI collective operation to the latency of semantically equivalent functions, which we call the mock-up implementations. The structure of the mock-up implementations is de ned by selfconsistent performance guidelines. The advantage of this approach is that tuning using mock-up implementations is always possible, whether or not an MPI library allows users to select a speci c algorithm at run-time. We implement this concept in a library called PGMPITuneLib, which is layered between the user code and the actual MPI implementation. This library selects the best-performing algorithmic pattern of an MPI collective by intercepting MPI calls and redirecting them to our mock-up implementations. Experimental results show that PGMPITuneLib can signi cantly reduce the latency of MPI collectives, and also equally important, that it can help identifying the tuning potential of MPI libraries.
منابع مشابه
Flexible collective communication tuning architecture applied to Open MPI
Collective communications are invaluable to modern high performance applications, although most users of these communication patterns do not always want to know their inner most working. The implementation of the collectives are often left to the middle-ware developer such as those providing an MPI library. As many of these libraries are designed to be both generic and portable the MPI develope...
متن کاملEvaluation of Optimized Barrier Algorithms for SCI Networks with Different MPI Implementations
The SCI Collectives Library is a new software package which implements optimized collective communication operations on SCI networks. It is designed to be coupled to different higher-level communication libraries (especially MPI implementations) by adapter modules, thereby giving them access to these optimized collectives. In this work, we present the design of the SCI Collectives Library and o...
متن کاملHierarchical Collectives in MPICH2
Most parallel systems on which MPI is used are now hierarchical: some processors are much closer to others in terms of interconnect performance. One of the most common such examples are systems whose nodes are symmetric multiprocessors (including “multicore” processors). A number of papers have developed algorithms and implementations that exploit shared memory on such nodes to provide optimize...
متن کاملOptimizing MPI Collectives for X1
Traditionally MPI collective operations have been based on point-to-point messages, with possible optimizations for system topologies and communication protocols. The Cray X1 scatter/gather hardware and shared memory mapping features allow for significantly different approaches to MPI collectives leading to substantial performance gains over standard methods, especially for short message length...
متن کاملPGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines
The Message Passing Interface (MPI) is the most commonly used application programming interface for process communication on current large-scale parallel systems. Due to the scale and complexity of modern parallel architectures, it is becoming increasingly difficult to optimize MPI libraries, as many factors can influence the communication performance. To assist MPI developers and users, we pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.09965 شماره
صفحات -
تاریخ انتشار 2017